160 research outputs found
Shot-based object retrieval from video with compressed Fisher vectors
This paper addresses the problem of retrieving those shots from a database of video sequences that match a query image. Existing architectures are mainly based on Bag of Words model, which consists in matching the query image with a high-level representation of local features extracted from the video database. Such architectures lack however the capability to scale up to very large databases. Recently, Fisher Vectors showed promising results in large scale image retrieval
problems, but it is still not clear how they can be best exploited in video-related applications. In our work, we use compressed Fisher Vectors to represent the video-shots and we show that inherent correlation between video-frames can be proficiently exploited. Experiments show that our proposal enables better performance for lower computational requirements than similar architectures
Long-term Tracking in the Wild: A Benchmark
We introduce the OxUvA dataset and benchmark for evaluating single-object
tracking algorithms. Benchmarks have enabled great strides in the field of
object tracking by defining standardized evaluations on large sets of diverse
videos. However, these works have focused exclusively on sequences that are
just tens of seconds in length and in which the target is always visible.
Consequently, most researchers have designed methods tailored to this
"short-term" scenario, which is poorly representative of practitioners' needs.
Aiming to address this disparity, we compile a long-term, large-scale tracking
dataset of sequences with average length greater than two minutes and with
frequent target object disappearance. The OxUvA dataset is much larger than the
object tracking datasets of recent years: it comprises 366 sequences spanning
14 hours of video. We assess the performance of several algorithms, considering
both the ability to locate the target and to determine whether it is present or
absent. Our goal is to offer the community a large and diverse benchmark to
enable the design and evaluation of tracking methods ready to be used "in the
wild". The project website is http://oxuva.netComment: To appear at ECCV 201
Class-Agnostic Counting
Nearly all existing counting methods are designed for a specific object
class. Our work, however, aims to create a counting model able to count any
class of object. To achieve this goal, we formulate counting as a matching
problem, enabling us to exploit the image self-similarity property that
naturally exists in object counting problems. We make the following three
contributions: first, a Generic Matching Network (GMN) architecture that can
potentially count any object in a class-agnostic manner; second, by
reformulating the counting problem as one of matching objects, we can take
advantage of the abundance of video data labeled for tracking, which contains
natural repetitions suitable for training a counting model. Such data enables
us to train the GMN. Third, to customize the GMN to different user
requirements, an adapter module is used to specialize the model with minimal
effort, i.e. using a few labeled examples, and adapting only a small fraction
of the trained parameters. This is a form of few-shot learning, which is
practical for domains where labels are limited due to requiring expert
knowledge (e.g. microbiology). We demonstrate the flexibility of our method on
a diverse set of existing counting benchmarks: specifically cells, cars, and
human crowds. The model achieves competitive performance on cell and crowd
counting datasets, and surpasses the state-of-the-art on the car dataset using
only three training images. When training on the entire dataset, the proposed
method outperforms all previous methods by a large margin.Comment: Asian Conference on Computer Vision (ACCV), 201
Rhythm and Vowel Quality in Accents of English
In a sample of 27 speakers of Scottish Standard English two notoriously variable consonantal features are investigated: the contrast of /m/ and /w/ and non-prevocalic /r/, the latter both in terms of its presence or absence and the phonetic form it takes, if present. The pattern of realisation of non-prevocalic /r/ largely confirms previously reported findings. But there are a number of surprising results regarding the merger of /m/ and /w/ and the loss of non-prevocalic /r/: While the former is more likely to happen in younger speakers and females, the latter seems more likely in older speakers and males. This is suggestive of change in progress leading to a loss of the /m/ - /w/ contrast, while the variation found in non-prevocalic /r/ follows an almost inverse sociolinguistic pattern that does not suggest any such change and is additionally largely explicable in language-internal terms. One phenomenon requiring further investigation is the curious effect direct contact with Southern English accents seems to have on non-prevocalic /r/: innovation on the structural level (i.e. loss) and conservatism on the realisational level (i.e. increased incidence of [r] and [r]) appear to be conditioned by the same sociolinguistic factors
Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers
This paper improves state-of-the-art visual object trackers that use online
adaptation. Our core contribution is an offline meta-learning-based method to
adjust the initial deep networks used in online adaptation-based tracking. The
meta learning is driven by the goal of deep networks that can quickly be
adapted to robustly model a particular target in future frames. Ideally the
resulting models focus on features that are useful for future frames, and avoid
overfitting to background clutter, small parts of the target, or noise. By
enforcing a small number of update iterations during meta-learning, the
resulting networks train significantly faster. We demonstrate this approach on
top of the high performance tracking approaches: tracking-by-detection based
MDNet and the correlation based CREST. Experimental results on standard
benchmarks, OTB2015 and VOT2016, show that our meta-learned versions of both
trackers improve speed, accuracy, and robustness.Comment: Code: https://github.com/silverbottlep/meta_tracker
Self-supervised Keypoint Correspondences for Multi-Person Pose Estimation and Tracking in Videos
Video annotation is expensive and time consuming. Consequently, datasets for
multi-person pose estimation and tracking are less diverse and have more sparse
annotations compared to large scale image datasets for human pose estimation.
This makes it challenging to learn deep learning based models for associating
keypoints across frames that are robust to nuisance factors such as motion blur
and occlusions for the task of multi-person pose tracking. To address this
issue, we propose an approach that relies on keypoint correspondences for
associating persons in videos. Instead of training the network for estimating
keypoint correspondences on video data, it is trained on a large scale image
datasets for human pose estimation using self-supervision. Combined with a
top-down framework for human pose estimation, we use keypoints correspondences
to (i) recover missed pose detections (ii) associate pose detections across
video frames. Our approach achieves state-of-the-art results for multi-frame
pose estimation and multi-person pose tracking on the PosTrack and
PoseTrack data sets.Comment: Submitted to ECCV 202
DELTAS: Depth Estimation by Learning Triangulation And densification of Sparse points
Multi-view stereo (MVS) is the golden mean between the accuracy of active
depth sensing and the practicality of monocular depth estimation. Cost volume
based approaches employing 3D convolutional neural networks (CNNs) have
considerably improved the accuracy of MVS systems. However, this accuracy comes
at a high computational cost which impedes practical adoption. Distinct from
cost volume approaches, we propose an efficient depth estimation approach by
first (a) detecting and evaluating descriptors for interest points, then (b)
learning to match and triangulate a small set of interest points, and finally
(c) densifying this sparse set of 3D points using CNNs. An end-to-end network
efficiently performs all three steps within a deep learning framework and
trained with intermediate 2D image and 3D geometric supervision, along with
depth supervision. Crucially, our first step complements pose estimation using
interest point detection and descriptor learning. We demonstrate
state-of-the-art results on depth estimation with lower compute for different
scene lengths. Furthermore, our method generalizes to newer environments and
the descriptors output by our network compare favorably to strong baselines.
Code is available at https://github.com/magicleap/DELTASComment: ECCV 202
Hard Occlusions in Visual Object Tracking
Visual object tracking is among the hardest problems in computer vision, as
trackers have to deal with many challenging circumstances such as illumination
changes, fast motion, occlusion, among others. A tracker is assessed to be good
or not based on its performance on the recent tracking datasets, e.g., VOT2019,
and LaSOT. We argue that while the recent datasets contain large sets of
annotated videos that to some extent provide a large bandwidth for training
data, the hard scenarios such as occlusion and in-plane rotation are still
underrepresented. For trackers to be brought closer to the real-world scenarios
and deployed in safety-critical devices, even the rarest hard scenarios must be
properly addressed. In this paper, we particularly focus on hard occlusion
cases and benchmark the performance of recent state-of-the-art trackers (SOTA)
on them. We created a small-scale dataset containing different categories
within hard occlusions, on which the selected trackers are evaluated. Results
show that hard occlusions remain a very challenging problem for SOTA trackers.
Furthermore, it is observed that tracker performance varies wildly between
different categories of hard occlusions, where a top-performing tracker on one
category performs significantly worse on a different category. The varying
nature of tracker performance based on specific categories suggests that the
common tracker rankings using averaged single performance scores are not
adequate to gauge tracker performance in real-world scenarios.Comment: Accepted at ECCV 2020 Workshop RLQ-TO
Unveiling the Power of Deep Tracking
In the field of generic object tracking numerous attempts have been made to
exploit deep features. Despite all expectations, deep trackers are yet to reach
an outstanding level of performance compared to methods solely based on
handcrafted features. In this paper, we investigate this key issue and propose
an approach to unlock the true potential of deep features for tracking. We
systematically study the characteristics of both deep and shallow features, and
their relation to tracking accuracy and robustness. We identify the limited
data and low spatial resolution as the main challenges, and propose strategies
to counter these issues when integrating deep features for tracking.
Furthermore, we propose a novel adaptive fusion approach that leverages the
complementary properties of deep and shallow features to improve both
robustness and accuracy. Extensive experiments are performed on four
challenging datasets. On VOT2017, our approach significantly outperforms the
top performing tracker from the challenge with a relative gain of 17% in EAO
High Throughput Genetic Characterisation of Caucasian Patients Affected by Multi-Drug Resistant Rheumatoid or Psoriatic Arthritis
Rheumatoid and psoriatic arthritis (RA and PsA) are inflammatory rheumatic disorders characterised by a multifactorial etiology. To date, the genetic contributions to the disease onset, severity and drug response are not clearly defined, and despite the development of novel targeted therapies, ~10% of patients still display poor treatment responses. We characterised a selected cohort of eleven non-responder patients aiming to define the genetic contribution to drug resistance. An accurate clinical examination of the patients was coupled with several high-throughput genetic testing, including HLA typing, SNPs-array and Whole Exome Sequencing (WES). The analyses revealed that all the subjects carry very rare HLA phenotypes which contain HLA alleles associated with RA development (e.g., HLA-DRB1*04, DRB1*10:01 and DRB1*01). Additionally, six patients also carry PsA risk alleles (e.g., HLA-B*27:02 and B*38:01). WES analysis and SNPs-array revealed 23 damaging variants with 18 novel “drug-resistance” RA/PsA candidate genes. Eight patients carry likely pathogenic variants within common genes (CYP21A2, DVL1, PRKDC, ORAI1, UGT2B17, MSR1). Furthermore, “private” damaging variants were identified within 12 additional genes (WNT10A, ABCB7, SERPING1, GNRHR, NCAPD3, CLCF1, HACE1, NCAPD2, ESR1, SAMHD1, CYP27A1, CCDC88C). This multistep approach highlighted novel RA/PsA candidate genes and genotype-phenotype correlations potentially useful for clinicians in selecting the best therapeutic strategy
- …